Unsupervised Profiling Methods for Fraud Detection
نویسنده
چکیده
Credit card fraud falls broadly into two categories: behavioural fraud and application fraud. Application fraud occurs when individuals obtain new credit cards from issuing companies using false personal information and then spend as much as possible in a short space of time. However, most credit card fraud is behavioural and occurs when details of legitimate cards have been obtained fraudulently and sales are made on a 'Cardholder Not Present' basis. These sales include telephone sales and e-commerce transactions where only the card details are required. In this paper, we are concerned with detecting behavioural fraud through the analysis of longitudinal data. These data usually consist of credit card transactions over time, but can include other variables, both static and longitudinal. Statistical methods for fraud detection are often classification (supervised) methods that discriminate between known fraudulent and non-fraudulent transactions; however, these methods rely on accurate identification of fraudulent transactions in historical databases – information that is often in short supply or non-existent. We are particularly interested in unsupervised methods that do not use this information but instead detect changes in behaviour or unusual transactions. We discuss two methods for unsupervised fraud detection in credit data in this paper and apply them to some real data sets. Peer group analysis is a new tool for monitoring behaviour over time in data mining situations. In particular, the tool detects individual accounts that begin to behave in a way distinct from accounts to which they had previously been similar. Each account is selected as a target account and is compared with all other accounts in the database, using either external comparison criteria or internal criteria summarizing earlier behaviour patterns of each account. Based on this comparison, a peer group of accounts most similar to the target account is chosen. The behaviour of the peer group is then summarized at each subsequent time point, and the behaviour of the target account compared with the summary of its peer group. Those target accounts exhibiting behaviour most different from their peer group summary behaviour are flagged as meriting closer investigation. Break point analysis is a tool that identifies changes in spending behaviour based on the transaction information in a single account. Recent transactions are compared with previous spending behaviour to detect features such as rapid spending and an increase in the level of spending, features that would not necessarily be captured by outlier detection. Introduction In the fight against fraud, actions fall under two broad categories: fraud prevention and fraud detection. Fraud prevention describes measures to stop fraud occurring in the first place. These include PINs for bankcards, Internet security systems for credit card transactions and passwords on telephone bank accounts. In contrast, fraud detection involves identifying fraud as quickly as possible once it has been perpetrated. We apply fraud detection once fraud prevention has failed, using detection methods continuously, as we will usually be unaware that fraud prevention has failed. In this article we are concerned solely with fraud detection. Fraud detection must evolve continuously. Once criminals realise that a certain mode of fraudulent behaviour can be detected, they will adapt their strategies and try others. Of course, new criminals are also attempting to commit fraud and many of these will not be aware of the fraud detection methods that have been successful in the past, and will adopt strategies that lead to identifiable frauds. This means that the earlier detection tools need to be applied as well as the latest developments. Statistical fraud detection methods may be ‘supervised’ or ‘unsupervised’. In supervised methods, models are trained to discriminate between fraudulent and non-fraudulent behaviour, so that new observations can be assigned to classes so as to optimise some measure of classification performance. Of course, this requires one to be confident about the true classes of the original data used to build the models; uncertainty is introduced when legitimate transactions are mistakenly reported as fraud or when fraudulent observations are not identified as such. Supervised methods require that we have examples of both classes, and they can only be used to detect frauds of a type that have previously occurred. These methods also suffer from the problem of unbalanced class sizes: in fraud detection problems, the legitimate transactions generally far outnumber the fraudulent ones and this imbalance can cause misspecification of models. Brause et al (1999) say that, in their database of credit card transactions, ‘the probability of fraud is very low (0.2%) and has been lowered in a preprocessing step by a conventional fraud detecting system down to 0.1%.’ Hassibi (2000) remarks ‘Out of some 12 billion transactions made annually, approximately 10 million – or one out of every 1200 transactions – turn out to be fraudulent.’ In contrast, unsupervised methods simply seek those accounts, customers, etc. whose behaviour is ‘unusual’. We model a baseline distribution that represents normal behaviour and then attempt to detect observations that show greatest departure from this norm. These can then be examined more closely. Outliers are a basic form of nonstandard observation that can be used for fraud detection. This leads us to note the fundamental point that we can seldom be certain, by statistical analysis alone, that a fraud has been perpetrated. Rather, the analysis should be regarded as alerting us to the fact that an observation is anomalous, or more likely to be fraudulent than others – so that it can then be investigated in more detail. One can think of the objective of the statistical analysis as being to return a suspicion score (where we will regard a higher score as more suspicious than a lower one). The higher the score is, then the more unusual is the observation, or the more like previously fraudulent values it is. The fact that there are many different ways in which fraud can be perpetrated, and many different scenarios in which it can occur, means that there are many different ways of computing suspicion scores. We can compute suspicion scores for each account in the database, and these scores can be updated as time progresses. By ordering accounts according to their suspicion score, we can focus attention on those with the highest scores, or on those that exhibit a sudden increase in suspicion score. If we have a limited budget, so that we can only afford to investigate a certain number of accounts or records, we can concentrate investigation on those thought to be most likely to be fraudulent. Credit Card Fraud Credit card fraud is perpetrated in various ways but can be broadly categorised as application, ‘missing in post’, stolen/lost card, counterfeit card and ‘cardholder not present’ fraud. Application fraud arises when individuals obtain new credit cards from issuing companies using false personal information; application fraud totalled £10.2 million in 2000 (Source: APACS) and is the only type of fraud that actually declined between 1999 and 2000. ‘Missing in post’ (£17.3m in 2000) describes the interception of credit cards in the post by fraudsters before they reach the cardholder. Stolen or lost cards accounted for £98.9 million in fraud in 2000, but the greatest percentage increases between 1999 and 2000 were in counterfeit card fraud (£50.3m to £102.8m) and ‘cardholder not present’ (i.e. postal, phone, internet transactions) fraud (£29.3m to £56.8m). To commit these last two types of fraud it is necessary to obtain the details of the card without the cardholder’s knowledge. This is done in various ways, including employees using an unauthorised ‘swiper’ that downloads the encoded information onto a laptop computer and hackers obtaining credit card details by intrusion into companies’ computer networks. A counterfeit card is then made, or the card details simply used for phone, postal or Internet transactions. Supervised methods to detect fraudulent transactions can be used to discriminate between those accounts or transactions known to be fraudulent and those known (or at least presumed) to be legitimate. For example, traditional credit scorecards (Hand and Henley, 1997) are used to detect customers who are likely to default, and the reasons for this may include fraud. Such scorecards are based on the details given on the application forms, and perhaps also on other details, such as bureau information. Classification techniques, such as statistical discriminant analysis and neural networks, can be used to discriminate between fraudulent and non-fraudulent transactions to give transactions a suspicion score. However, information about fraudulent transactions may not be available and in these cases we apply unsupervised methods to attempt to detect fraud. These methods are scarce in the literature and are less popular than supervised methods in practice as suspicion scores reflect a propensity to act anomalously when compared with previous behaviour. This is different to suspicion scores obtained using supervised techniques, which are guided to reflect a propensity to commit fraud in a manner already previously discovered. The idea behind suspicion scores from unsupervised methods is that unusual behaviour or transactions can often be indicators of fraud. An advantage of using unsupervised methods over supervised methods is that previously undiscovered types of fraud may be detected. Supervised methods are only trained to discriminate between legitimate transactions and previously known fraud. Unsupervised methods and their application to fraud detection As we mentioned above, the emphasis on fraud detection methodology is with supervised techniques. In particular, neural networks have proved popular – predictably, perhaps, given the attention they have received. Researchers who have used neural networks for supervised credit card fraud detection include Ghosh and Reilly (1994), Aleskerov et al. (1997), Dorronsoro et al. (1997), and Brause et al (1999). However, unsupervised credit card fraud detection has not received attention in the literature. Unsupervised fraud detection methods have been researched in the detection of computer intrusion (hacking). Here profiles are trained on the combinations of commands that a user uses most frequently in their account. If a hacker gains illegal access to the account then their intrusion is detected by the presence of sequences of commands that are not in the profile of commands typed by the legitimate user. Qu, Vetter et al. (1998) use probabilities of events to define the profile, Lane and Brodley (1998), Forrest et al (1996) and Kosoresow and Hofmeyr (1997) use similarity of sequences that can be interpreted in a probabilistic framework. Unsupervised methods are useful in applications where there is no prior knowledge as to the particular class of observations in a data set. For example, we may not be able to know for sure which transactions in a database are fraudulent and which are legitimate. In these situations, unsupervised methods can be used to find groups or find outliers in the data. Essentially, we collect data to provide a summary of the system that we are studying. Once we have a summary of the behaviour of the system, we can identify those observations that do not fit in with this behaviour, i.e. anomalous observations. This is our aim in using unsupervised statistical techniques for fraud detection. The most popular unsupervised method used in data mining is clustering. This technique is used to find natural groupings of observations in the data and is especially useful in market segmentation. However, cluster analysis can suffer from a bad choice of metric (the way we scale, transform and combine variables to measure the ‘distance’ between observations); for example, it can be difficult to combine categorical and continuous variables in a good clustering metric. Observations may cluster differently on some subsets of variables than they do on others so that we may have more than one valid clustering in a data set. We can use unsupervised methods such as clustering to help us form local models from which we can find local outliers in the data. In the context of fraud detection, a global outlier is a transaction anomalous to the entire data set; for example, a purchase of several thousand pounds would be a global outlier if all other transactions in the database were considerably less than that amount. Local outliers describe transactions that are anomalous when compared to subgroups of the data. Local outlier detection is effective in situations where the population is heterogeneous; this is true of credit card transaction data where spending behaviour between accounts can vary according to amounts spent and the purchases that are made. If we can identify the spending behaviour of a particular account, then a transaction is a local outlier if it is anomalous to spending in that account (or accounts similar to it), but not necessarily anomalous to the entire population of transactions. For example, a transaction of a thousand pounds in an account where, historically, all transactions have been under a hundred pounds might be considered as a local outlier; however, such a transaction may not have been considered unusual if it had occurred in a high spending account, and thus would not be a global outlier. The fundamental challenge is in the formation of the local model, which can be achieved in a variety of ways. One way is through cluster analysis. Here, legitimate transactions from all accounts are clustered into groups with similar characteristics. The local model, or profile, of a particular account is then determined by the clusters to which its transactions are allocated. If a future transaction from the account is then allocated to a cluster not in the account profile, then an alarm is raised for that transaction. Care must be exercised in choosing variables and metrics on which to cluster. Nearest-neighbour methods can be employed to combine transaction information from accounts that exhibit similar behaviour. We have developed Peer Group Analysis as a tool that uses local models of spending behaviour over time to detect changes in spending within accounts; we describe an application of Peer Group Analysis to fraud detection below. We follow this with a description of Break Point Analysis. Here, a local model is created and updated by drawing information from transactions within the same account. Sequences of transactions within that account are compared with this local model to indicate changes in spending behaviour. Peer Group Analysis We propose Peer Group Analysis (Bolton and Hand, 2001) as a candidate method for unsupervised fraud detection. Peer group analysis is a new tool for monitoring behaviour over time in data mining situations. In particular, the tool detects individual objects that begin to behave in a way distinct from objects to which they had previously been similar. Each object is selected as a target object and is compared with all other objects in the database, using either external comparison criteria or internal criteria summarizing earlier behaviour patterns of each object. Based on this comparison, a peer group of objects most similar to the target object is chosen. The behaviour of the peer group is then summarized at each subsequent time point, and the behaviour of the target object compared with the summary of its peer group. Those target objects exhibiting behaviour most different from their peer group summary behaviour are flagged as meriting closer investigation. The tool is intended to be part of the data mining process, involving cycling between the detection of objects that behave in anomalous ways and the detailed examination of those objects. Several aspects of peer group analysis can be tuned to the particular application, including the size of the peer group, the width of the moving behaviour window being used, the way the peer group is summarised, and the measures of difference between the target object and its peer group summary. The distinguishing feature of Peer Group Analysis (PGA) lies in its focus on local patterns rather than global models (Hand et al, 2000; Hand, Mannila, and Smyth, 2001): a sequence may not evolve unusually when compared with the whole population of sequences but may display unusual properties when compared with its peer group. That is, it may begin to deviate in behaviour from objects to which it has previously been similar. Let us suppose that we have observations on N objects, where each observation is a sequence of d values, represented by a vector, xi, of length d. The jth value of the ith observation, xij, occurs at a fixed time point tj. Let PGi(tj) = {Some subset of observations ( xi) which show behaviour similar to that of xi at time tj}. Then PGi(tj) is the peer group of object i, at time j. The parameter npeer describes the number of objects in the peer group and effectively controls the sensitivity of the peer group analysis. The size of npeer reflects how local a model we require. Of course, if npeer is chosen to be too small then the behaviour of the peer group may be too sensitive to random errors and thus inaccurate. Let Sij be a statistic summarizing the behaviour of the ith observation at time j. We will define similarity between objects in terms of their measures, Sij. This measure could be a sequence of observations preceding time point j or it could be some statistical summary of these observations, such as a moving average or a trend. We define a (dis)similarity metric D(Si1, Sj1), j i , on the Si1 to order objects according to how similar their behaviour at t1 is to that of the target object, xi . The npeer most similar objects to the target object comprise the peer group, PGi. Choice of a suitable metric depends on the data to be analyzed; we have used a two-stage variant of the Euclidean distance metric in this paper since the example data sets here contain continuous variables, but different metrics will be more suitable for categorical data or for data with variables on greatly differing scales of measurement. Different metrics may well yield different results (as with cluster analysis), so are worth exploring. Once we have found the peer group for the target observation xi we can calculate peer group statistics, Pij. These will generally be summaries of the values of Sij for the members of the peer group. The principle here is that the peer group initially provides a local model, Pi1, for Si1, thus characterizing the local behavior of xi at time t1, and will subsequently provide models, Pij, for Sij, at time tj, j>1. If our target observation, Sik, deviates ‘significantly’ from its peer group model Pik at time tk, then we conclude that our target is no longer behaving like its peers at time tk. If the departure is large enough, then the target observation will be flagged as worthy of investigation. To measure the departure of the target observation from its peer group we calculate its standardized distance from the peer group model; the example we use here is a standardized distance from the centroid of the peer group based on a t-statistic. The centroid value of the peer group is given by the equation: pj t PG p ij S npeer P i ) ( 1 1 ; 1 j , . i p where Pi(t1) is the peer group calculated at time t1. The variance of the peer group is then ij pj ij pj t PG p ij P S P S npeer V i ) ( 1 ) 1 ( 1 ; 1 j , . i p The square root of this can be used to standardize the difference between the target Sij and the peer group summary Pij, yielding
منابع مشابه
Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کاملOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier detection is useful for credit card fraud detection. Due to drastic increase in digital frauds, there is a lot of financial losses and therefore various techniques are developed for fraud detection and applied to diverse business fields. In high-dimensional data, outlier detection presents some challenges because of increment of dimensionality. In this paper, the proposed model aims to ...
متن کاملAnomaly Detection Using Unsupervised Profiling Method in Time Series Data
The anomaly detection problem has important applications in the field of fraud detection, network robustness analysis and intrusion detection. This paper is concerned with the problem of detecting anomalies in time series data using Peer Group Analysis (PGA), which is an unsupervised technique. The objective of PGA is to characterize the expected pattern of behavior around the target sequence i...
متن کاملMEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection
Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...
متن کاملAn Unsupervised Neural Network Approach to Profiling the Behavior of Mobile Phone Users for Use in Fraud Detection
This paper discusses the current status of research on fraud detection undertaken as part of the European Commission-funded ACTS ASPeCT (Advanced Security for Personal Communications Technologies) project, by Royal Holloway University of London. Using a recurrent neural network technique, we uniformly distribute prototypes over toll tickets, sampled from the U.K. network operator, Vodafone. The...
متن کاملPricing fraud detection in online shopping malls using a finite mixture model
Although pricing fraud is an important issue for improving service quality of online shopping malls, research on automatic fraud detection has been limited. In this paper, we propose an unsupervised learning method based on a finite mixture model to identify pricing frauds. We consider two states, normal and fraud, for each item according to whether an item description is relevant to its price ...
متن کامل